-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use trie for prefix matching #851
Conversation
…ting subscribers from the trie
Signed-off-by: பாலாஜி <rbalajis25@gmail.com>
publisher.go
Outdated
if _, ok := updatedIDs[id]; ok { | ||
continue | ||
} | ||
kvs, ok := batchedUpdates[id] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ineffectual assignment to kvs
(from ineffassign
)
Signed-off-by: பாலாஜி <rbalajis25@gmail.com>
Signed-off-by: பாலாஜி <rbalajis25@gmail.com>
@gitlw PTAL I updated the PR.(couldn't able to push the update on your branch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 3 files at r3, 1 of 1 files at r4.
Reviewable status: all files reviewed, 15 unresolved discussions (waiting on @ashish-goswami, @manishrjain, and @Sch00lb0y)
publisher.go, line 22 at r4 (raw file):
"sync" "github.com/dgraph-io/badger/trie"
minor: move this to in between the pb and y imports.
publisher.go, line 88 at r4 (raw file):
for _, e := range req.Entries { ids := p.indexer.Get(e.Key) // trie can give dulplicate ids
start comments with uppercase. also periods.
Same for the other commetns.
publisher.go, line 92 at r4 (raw file):
// if we searching the trie with hello, we'll be getting [1,1] // but we have to send only one update for that we're removing // duplicate id
duplicate ids.
publisher.go, line 95 at r4 (raw file):
updatedIDs := make(map[uint64]struct{}) for _, id := range ids { // ignoring if the entry is already updated for the id
minor: uppercase and period at the end.
publisher.go, line 149 at r4 (raw file):
func (p *publisher) sendUpdates(reqs []*request) { // TODO: Prefix check before pushing into pubCh.
I see this TODO removed but there's no code changes nearby. Is this TODO addressed somewhere else?
trie/trie.go, line 4 at r4 (raw file):
type node struct { childrens map[byte]*node
s/childrens/children
trie/trie.go, line 15 at r4 (raw file):
} // Trie datastructure
minor: period at the end.
trie/trie.go, line 20 at r4 (raw file):
} // NewTrie returns trie
// NewTrie returns a new trie instance.
minor: period at the end.
trie/trie.go, line 27 at r4 (raw file):
} // Add adds the item in the trie for the given prefix path
minor: period at the end.
trie/trie.go, line 41 at r4 (raw file):
} // Get returns prefix matched items for the given key
minor: period at the end.
trie/trie.go, line 43 at r4 (raw file):
// Get returns prefix matched items for the given key func (t *Trie) Get(key []byte) []uint64 { o := []uint64{}
what's o? Maybe a more descriptive name would help.
trie/trie.go, line 56 at r4 (raw file):
} // Delete will delete the item if the item exist in the given index path
minor: period at the end.
trie/trie_test.go, line 18 at r4 (raw file):
ids := trie.Get([]byte("hel")) if len(ids) != 1 { t.Errorf("expected number of ids 1 but got %d", len(ids))
use require instead.
require.Equal(t, 1, len(ids))
Same for the rest of the file.
trie/trie_test.go, line 19 at r4 (raw file):
if len(ids) != 1 { t.Errorf("expected number of ids 1 but got %d", len(ids)) return
Also, don't manually return early from a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got a few comments.
Reviewed 2 of 3 files at r3, 1 of 1 files at r4.
Reviewable status: all files reviewed, 22 unresolved discussions (waiting on @ashish-goswami and @Sch00lb0y)
publisher.go, line 93 at r4 (raw file):
// but we have to send only one update for that we're removing // duplicate id updatedIDs := make(map[uint64]struct{})
Trie should not be returning duplicate ids.
trie/trie.go, line 1 at r4 (raw file):
package trie
Add license.
trie/trie.go, line 28 at r4 (raw file):
// Add adds the item in the trie for the given prefix path func (t *Trie) Add(prefix []byte, item uint64) {
s/item/id
trie/trie.go, line 38 at r4 (raw file):
curr = n } curr.items = append(curr.items, item)
Might want to check if the id already exists in items.
trie/trie.go, line 53 at r4 (raw file):
curr = n } return o
You can sort the output and remove duplicates here.
trie/trie.go, line 68 at r4 (raw file):
//NOTE: we're just removing the item not the hanging path old := curr.items new := []uint64{}
Don't call it new to avoid shadowing the keyword.
trie/trie_test.go, line 1 at r4 (raw file):
package trie
license.
- Fixing comment Signed-off-by: பாலாஜி <rbalajis25@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 3 files reviewed, 16 unresolved discussions (waiting on @ashish-goswami, @manishrjain, @martinmr, and @Sch00lb0y)
publisher.go, line 88 at r4 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
start comments with uppercase. also periods.
Same for the other commetns.
Done.
publisher.go, line 92 at r4 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
duplicate ids.
Done.
publisher.go, line 93 at r4 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Trie should not be returning duplicate ids.
Done.
publisher.go, line 149 at r4 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
I see this TODO removed but there's no code changes nearby. Is this TODO addressed somewhere else?
Yes, we're using Trie in publishUpdates
trie/trie.go, line 1 at r4 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Add license.
Done.
trie/trie.go, line 4 at r4 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
s/childrens/children
Done.
trie/trie.go, line 20 at r4 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
// NewTrie returns a new trie instance.
minor: period at the end.
Done.
trie/trie.go, line 28 at r4 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
s/item/id
Done.
trie/trie.go, line 38 at r4 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Might want to check if the id already exists in items.
Done.
trie/trie.go, line 43 at r4 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
what's o? Maybe a more descriptive name would help.
Done.
trie/trie.go, line 53 at r4 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
You can sort the output and remove duplicates here.
Done.
trie/trie.go, line 68 at r4 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Don't call it new to avoid shadowing the keyword.
Done.
trie/trie_test.go, line 18 at r4 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
use require instead.
require.Equal(t, 1, len(ids))
Same for the rest of the file.
Done.
trie/trie_test.go, line 19 at r4 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
Also, don't manually return early from a test.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r5.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @ashish-goswami, @manishrjain, and @Sch00lb0y)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r5.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @ashish-goswami, @manishrjain, and @Sch00lb0y)
publisher.go, line 92 at r5 (raw file):
} k := y.SafeCopy(nil, e.Key) kv := &pb.KV{
KV is meant to not be modified, so we don't need to copy it. We can just create it once and append it to batched updates' KVList.
trie/trie.go, line 68 at r4 (raw file):
Previously, sch00lb0y (பாலாஜி ஜின்னா) wrote…
Done.
Same slice trick here as well.
trie/trie.go, line 54 at r5 (raw file):
curr.children[val] = n } curr = n
child, ok := node.children[val]
if !ok { child = newNode(); node.children[val] = child}
node = child
trie/trie.go, line 74 at r5 (raw file):
return out[i] < out[j] }) res := []uint64{}
res := out[:0]
You can use a Slice trick to avoid creating another slice.
https://github.com/golang/go/wiki/SliceTricks#filtering-without-allocating
trie/trie.go, line 88 at r5 (raw file):
curr := t.root for _, val := range index { n, ok := curr.children[val]
See above about naming of node and child.
trie/trie.go, line 94 at r5 (raw file):
curr = n } //NOTE: We're just removing the id not the hanging path.
// We're just removing...
Remove the "NOTE:".
Signed-off-by: பாலாஜி <rbalajis25@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 3 files reviewed, 10 unresolved discussions (waiting on @ashish-goswami, @manishrjain, @martinmr, and @Sch00lb0y)
publisher.go, line 92 at r5 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
KV is meant to not be modified, so we don't need to copy it. We can just create it once and append it to batched updates' KVList.
Done.
trie/trie.go, line 54 at r5 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
child, ok := node.children[val] if !ok { child = newNode(); node.children[val] = child} node = child
Done.
trie/trie.go, line 74 at r5 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
res := out[:0]
You can use a Slice trick to avoid creating another slice.
https://github.com/golang/go/wiki/SliceTricks#filtering-without-allocating
Done.
trie/trie.go, line 88 at r5 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
See above about naming of node and child.
Done.
trie/trie.go, line 94 at r5 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
// We're just removing...
Remove the "NOTE:".
Done.
Signed-off-by: பாலாஜி <rbalajis25@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few more comments.
Reviewed 2 of 2 files at r6.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @Sch00lb0y)
publisher.go, line 92 at r5 (raw file):
Previously, sch00lb0y (பாலாஜி ஜின்னா) wrote…
Done.
That doesn't address it.
I meant if you see multiple subscribers needing this KV, you can create it once and just append the pointer to their KV. You don't need to recreate pb.KV for every one of the ids.
trie/trie.go, line 43 at r4 (raw file):
// Get returns prefix matched items for the given key func (t *Trie) Get(key []byte) []uint64 { o := []uint64{}
Using map would be better. In fact, you can just return the map instead of returning a list. The map can be of type map[uint64]struct{}
.
This would avoid having to do the sorting, then removal of duplicates, etc.
trie/trie.go, line 56 at r6 (raw file):
node = child } node.ids = append(node.ids, id)
Add a comment that we only need to add id to the last node for this prefix.
Could you please link to an issue describing what this change is fixing? Thanks! |
trie/trie.go, line 96 at r6 (raw file):
Using range form of loop will be better ? |
trie/trie.go
Outdated
|
||
// Get returns prefix matched ids for the given key. | ||
func (t *Trie) Get(key []byte) map[uint64]struct{} { | ||
out := make(map[uint64]struct{}, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
S1019: should use make(map[uint64]struct{}) instead (from gosimple
)
- reuse the kv across all the subscribers Signed-off-by: பாலாஜி <rbalajis25@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 3 files reviewed, 6 unresolved discussions (waiting on @animesh2049, @manishrjain, @martinmr, and @Sch00lb0y)
trie/trie.go, line 43 at r4 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Using map would be better. In fact, you can just return the map instead of returning a list. The map can be of type
map[uint64]struct{}
.This would avoid having to do the sorting, then removal of duplicates, etc.
Done.
trie/trie.go, line 56 at r6 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Add a comment that we only need to add id to the last node for this prefix.
Done.
trie/trie.go, line 96 at r6 (raw file):
Previously, animesh2049 (Animesh Chandra Pathak) wrote…
Using range form of loop will be better ?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got a few comments. Address those and get someone in the team to merge it.
Can't wait for you to join the team!
Reviewed 2 of 3 files at r7, 1 of 1 files at r8.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @animesh2049 and @Sch00lb0y)
publisher.go, line 87 at r8 (raw file):
for _, e := range req.Entries { ids := p.indexer.Get(e.Key) if len(ids) != 0 {
if len(ids) > 0
publisher.go, line 98 at r8 (raw file):
for id := range ids { if _, ok := batchedUpdates[id]; !ok { batchedUpdates[id] = &pb.KVList{}
Instead of creating an empty KVList every time, you can create an empty KVList above as a const or var, and just assign these to it.
trie/trie.go, line 56 at r6 (raw file):
Previously, sch00lb0y (பாலாஜி ஜின்னா) wrote…
Done.
I don't see your comment. Why did you mark this as "Done"?
Signed-off-by: பாலாஜி <rbalajis25@gmail.com>
- change not equal Signed-off-by: பாலாஜி <rbalajis25@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 3 files reviewed, 6 unresolved discussions (waiting on @animesh2049, @manishrjain, and @Sch00lb0y)
publisher.go, line 87 at r8 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
if len(ids) > 0
Done.
publisher.go, line 98 at r8 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Instead of creating an empty KVList every time, you can create an empty KVList above as a const or var, and just assign these to it.
Sorry, I didn't understand this part :(
Could you explain this part?
trie/trie.go, line 56 at r6 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
I don't see your comment. Why did you mark this as "Done"?
Sorry, some mix-up happened. Instead of Add, I added the comment to Get :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r9.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @animesh2049 and @Sch00lb0y)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @animesh2049 and @Sch00lb0y)
* Add some TODOs for pubsub API. * Making the Loader API public * Adding the trie data structure for publishers, but got stuck for deleting subscribers from the trie * use trie for prefix matching Signed-off-by: பாலாஜி <rbalajis25@gmail.com> * delete id from trie Signed-off-by: பாலாஜி <rbalajis25@gmail.com> * fix ineffassign Signed-off-by: பாலாஜி <rbalajis25@gmail.com> * - trie return distinct element. - Fixing comment Signed-off-by: பாலாஜி <rbalajis25@gmail.com> * remove allocation for finding distict element Signed-off-by: பாலாஜி <rbalajis25@gmail.com> * change to the convention Signed-off-by: பாலாஜி <rbalajis25@gmail.com> * - use map instead of sorting to find distinct element - reuse the kv across all the subscribers Signed-off-by: பாலாஜி <rbalajis25@gmail.com> * - add comment - change not equal Signed-off-by: பாலாஜி <rbalajis25@gmail.com> (cherry picked from commit c32e701)
Close #868
This change is